Skip to content

Comments

FEAT: New Audio Converters#1375

Open
petebryan wants to merge 14 commits intoAzure:mainfrom
petebryan:pebryan_audio
Open

FEAT: New Audio Converters#1375
petebryan wants to merge 14 commits intoAzure:mainfrom
petebryan:pebryan_audio

Conversation

@petebryan
Copy link
Contributor

Description

Added new audio convertors to add the following:

  • Change the speed of an audio file without altering pitch AudioSpeedConverter
  • Add whitenoise over an existing audio file AudioWhiteNoiseConverter
  • Add an echo to an existing audio file AudioEchoConverter
  • Adjust volume of an audio file by scaling the amplitude. AudioVolumeConverter

Added new translation convertor to allow for mid sentence language switching in a prompt MultiLanguageTranslationConverter

  • Distinct from RandomTranslationConverter as focused on segment level granularity and deterministic translation.

Updated AzureSpeechTextToAudioConverter to handle a situation where an audio file input is handled and just passed back out. This handles situations when using the convertors with conversation history that may include mixed audio and text Messages that would otherwise throw exceptions.

Sorry I did not raise an issue for this ahead of time, experimentation of ideas turned into code and wanted to contribute. Happy to refactor whoever is deemed best.

Tests and Documentation

  • Added unit tests for all convertors to test functionality and ensure audio transformations do not adversely affect audio files. (58 unit tests)
  • Updated AzureSpeechTextToAudioConverter tests to test for case when audio_file is provided as input after update. (1 unit test)
  • Updated convertor documentation .py files to reflect these updates then ran jupytext --execute --to notebook to generate notebooks.

@petebryan petebryan marked this pull request as draft February 18, 2026 03:53
@petebryan petebryan marked this pull request as ready for review February 18, 2026 17:40
@petebryan petebryan changed the title [DRAFT] FEAT - New Audio Convertors FEAT: New Audio Convertors Feb 18, 2026
@romanlutz romanlutz changed the title FEAT: New Audio Convertors FEAT: New Audio Converters Feb 20, 2026
Comment on lines +128 to +133
if not self.input_supported(input_type):
raise ValueError("Input type not supported")

# If the input is already an audio path, pass it through unchanged.
if input_type == "audio_path":
return ConverterResult(output_text=prompt, output_type="audio_path")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must be thinking of a use case I am unable to anticipate 🙂 Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes you want to generate attacks that include previous turns and then add new turns on top. The problem is if those previous turns were audio, and the new turn you want to add on top is based on a text prompt. Then you have a mix of audio & text together and when you have the convertor attached to the target all prompts go through the convertor leading to it throwing an error when it tries to convert the audio_file of the previous turns. You could account for this in the notebook at run time but its easier and cleaner to have the convertor handle this and just pass through things are already audio. I couldn't really see a downside to having this in the convertor but keen to know if you can think of a problem this may cause.

logger = logging.getLogger(__name__)


class MultiLanguageTranslationConverter(PromptConverter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually doable with the selective text converter + translation converter, see https://azure.github.io/PyRIT/code/converters/6_selectively_converting.html#example-7-applying-converters-to-different-parts

I do see the appeal of having a shortcut, though. Wdyt?

Copy link
Contributor Author

@petebryan petebryan Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so I spent some time on this. My reasoning for having a separate convertor for this is:

  1. Without digging into the docs a bit its not easy to see how to do this, type of splitting so its not easily discoverable
  2. Implementing it in a notebook flow is a bit cumbersome, especially when you want to try a lot of different approaches. Having a convertor makes it cleaner and easier to implement but happy to change course if other disagree.
  3. Baking the capability into the RandomTranslationConverter could be doable be would add a level of complexity to the convertor that I felt having a separate one made sense from a maintainability point of view but very happy to take guidance on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could do something like:

  1. Wrap the splitting and chaining logic in something like SequenceLevelConverter (effectively a generalized version of WordLevelConverter)
  2. Merge MultiLanguageTranslationConverter and RandomTranslationConverter, maybe inheriting this new SequenceLevelConverter, supporting both fixed/random language selection, and sequence/word splitting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have @rlundeen2 chime in since he created STC. I briefly considered if this should be a shortcut to do what selective text converter does for this but it feels... not easier? Maybe because I'm already familiar with it. In any case, I don't see a case for having the implementation. At most, it should be an alias for using the selective text converter under the hood.

logger.info(
"Multi-language translation complete: %d segments across languages %s",
len(translated_segments),
self.languages[: len(segments)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
self.languages[: len(segments)],
self.languages[: len(self.languages)],

language = self.languages[i]

system_prompt = self._prompt_template.render_template_value(languages=language)
conversation_id = str(uuid.uuid4())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it matter at all that all of these are going to be part of a different conversation?

Raises:
ValueError: If speed_factor is not positive.
"""
if speed_factor <= 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an upper bound ?

info = np.iinfo(data.dtype)
max_val = float(info.max)
else:
max_val = 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should info be assigned here ? what happens on line 80 if it's not ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants